16:18
2026-06-18
lesswrong.com
ai-safety
Your Model Organisms Might Be Fried
Arcadia Alignment's research reveals that current AI model organisms used to study alignment pathologies suffer from degraded coherence, instruction-following, and reasoning, making them poor proxies โฆ